The new generation dating app Known breaks the 'swipe left or right' shallow matching model with AI voice conversation. Through deep conversation guidance, it increases the ratio of initial introductions turning into offline dates to 80%, far exceeding the less than 10% meeting rate of traditional apps. The company recently secured $9.7 million in funding, with participation from多家 institutions such as Forerunner.
Google has launched the StreetReaderAI prototype system, helping blind and low-vision users to independently explore Google Street View through natural language interaction. The system integrates computer vision, geographic information systems, and large language models, enabling a multimodal AI-driven real-time conversational street view experience, breaking through the limitations of traditional voice announcements and enhancing the freedom of accessible urban exploration.
[AI Daily Summary] Today's AI field saw multiple breakthroughs: 1) Moonshot AI's Kimi Open Platform launched Playground, upgrading AI from a conversational assistant to an intelligent assistant; 2) OpenAI released ChatGPT Agent, capable of performing tasks autonomously; 3) Suno v4.5+ introduced innovative music features such as voice replacement; 4) Google's Veo3 video generation model opened its API but at a high cost; 5) The first real-time video conversion AI model MirageLSD was introduced; 6) VSC
Mistral AI's chatbot Le Chat has received a major update with five core features now available: 1) Deep Research mode that intelligently breaks down complex questions and generates structured reports; 2) Voice input feature based on the Voxtral model for natural conversation; 3) Thinking mode using the Magistral model to handle complex reasoning; 4) A text-to-image modification feature developed in collaboration with the Black Forest Lab; 5) New project management tools to organize conversations and files. These features are now available on both web and mobile platforms, significantly enhancing AI capabilities.
A new foundational voice-to-voice model that delivers a human-like conversation experience.
Engage in natural voice conversations with large language models.
Anthropic
$7
Input tokens/M
$35
Output tokens/M
200
Context Length
Alibaba
-
$3.9
$15.2
64
Bytedance
$0.8
$2
128
Tencent
$1
$4
32
Deepseek
$12
$16
$3.5
$2.4
8
$0.3
Iflytek
Baidu
$3
$9
$1.6
$10
Marvis-AI
Marvis is an advanced conversational voice model designed for real-time streaming text-to-speech synthesis. It focuses on efficiency and ease of use, supporting high-quality real-time voice synthesis on consumer devices such as Apple chips, iPhones, iPads, and Macs.
webbigdata
VoiceCore is a commercially available Japanese voice AI agent model that focuses on enabling AI to have natural conversations with humans through voice. It has the ability to express emotions and non-verbal sounds and supports multiple voice style selections.
thomasgauthier
Hugging Face implementation of Sesame Technology's Conversational Speech Model (CSM), supporting text-to-speech and voice cloning tasks
gpt-omni
Mini-Omni2 is a fully interactive multimodal model capable of understanding image, audio, and text inputs, and engaging in end-to-end voice conversations with users.
An intelligent conversational robot project based on large models, supporting multi - platform access and multiple AI models, with text, voice, image processing, and plugin expansion capabilities, and can customize enterprise AI applications.